Skip to content

TST[string]: update expecteds for using_string_dtype to fix xfails #61727

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jbrockmendel
Copy link
Member

It isn't 100% obvious that the new repr for Categoricals is an improvement, but it's non-crazy. One of the remaining xfails one is for eval(repr(categorical_index)) round-tripping that won't be fixable unless we revert back to the old repr behavior.

I'm pretty sure that the fix in test_astype_dt64_to_string is correct and the test is just wrong, but merits a close look.

That leaves 12 xfails, including the one un-fixable round-trip one that we'll just remove. Of those...

  • test_join_on_key i think is surfacing an unrelated bug that I'll take a look at
  • test_to_dict_of_blocks_item_cache is failing because we don't make series.values read-only for ArrowStringArray. I think @mroeschke can comment on how viable/important that is.
  • test_string_categorical_index_repr is about CategoricalIndex repr that span multiple lines; with the StringDtype the padding is changed.
  • 4 in pandas/tests/io/json/test_pandas.py that im hoping @WillAyd can take point on
  • test_to_string_index_with_nan theres a MultiIndex level that reprs with a nan instead of NaN. Not a huge deal but having mixed-and-matched nans/NaNs in the repr is weird.
  • test_from_records_sequencelike i don't have a good read on
  • tests.base.test_misc::test_memory_usage is skipped instead of xfailed, but the reason says that it "doesn't work properly" for arrow strings which seems xfail-adjacent. Instead of skipping can we update the expected behavior cc @jorisvandenbossche ?

(Update: looks like I missed one in test_http_headers and another in test_fsspec)

@WillAyd
Copy link
Member

WillAyd commented Jun 30, 2025

The JSON issues stem back to the fact that:

>>> pd.Series([None, '', 'c']).astype(object)

yields different behavior with/without the future string dtype. In the "old" world, this would preserve the value of None but in the new world, None gets cast to a missing value indicator when contained within a series of string values.

In theory we could try and work around those semantics by natively supporting an object type in the JSON reader, but that's a ton of effort and I don't think worth it, given JSON does not natively support object storage

@jbrockmendel
Copy link
Member Author

jbrockmendel commented Jun 30, 2025

thanks, will update those tests' expecteds

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants